All the pilot data remains on our ftp site under the pilot_data directory EBI/NCBI. The variants which are discussed in the pilot paper can also be found on the ftp site EBI/NCBI.
Please note these data are all mapped to the NCBI36 human reference.
The 1000 Genomes Project shares some samples with the HapMap project; any sample which starts with NA was likely part of the HapMap project. In the pilot stages of the project HapMap genotypes were also used to help quality control the data and identify sample swaps and contamination. Since phase 1 the HapMap data has not been used by the 1000 Genomes Project, and all genotypes were independantly identified by 1000 Genomes.
The 1000 Genomes Project has run two different pull-down experiments. These are labelled as “exon targetted” and “exome”.
An exon targetted run is part of the pilot study which targetted 1000 genes in nearly 700 individuals. The targets for this pilot can be found in the pilot_data/technical/reference directory.
An exome run is part of the whole exome sequencing project which targetted the entirety of the CCDS gene set. The targets used for the phase 1 data release of 1092 samples can be found in technical/reference/exome_pull_down_targets_phases1_and_2; the targets for phase3 analysis can be found in technical/reference/exome_pull_down_targets/
The data directory represents the most current up-to-date view of sequence and alignment data available for the project. We also have a frozen data set which represents the data which was aligned for the pilot project as published in Nature in 2010.
An important difference to note is that while the main project data is all mapped to the GRCh37 assembly the pilot project was mapped to the NCBI36 assembly so positions of variants and alignments reported in the pilot_data directory will be different to what you see in the main project and many genome browsers. Genome browser and variant database also display the 1000 Genomes variants re-mapped to GRCh38, so these will give different coordinates again; you can access GRCh37 on Ensembl and UCSC genome browsers.